331 research outputs found

    Complexity modelling for case knowledge maintenance in case-based reasoning.

    Get PDF
    Case-based reasoning solves new problems by re-using the solutions of previously solved similar problems and is popular because many of the knowledge engineering demands of conventional knowledge-based systems are removed. The content of the case knowledge container is critical to the performance of case-based classification systems. However, the knowledge engineer is given little support in the selection of suitable techniques to maintain and monitor the case base. This research investigates the coverage, competence and problem-solving capacity of case knowledge with the aim of developing techniques to model and maintain the case base. We present a novel technique that creates a model of the case base by measuring the uncertainty in local areas of the problem space based on the local mix of solutions present. The model provides an insight into the structure of a case base by means of a complexity profile that can assist maintenance decision-making and provide a benchmark to assess future changes to the case base. The distribution of cases in the case base is critical to the performance of a case-based reasoning system. We argue that classification boundaries represent important regions of the problem space and develop two complexity-guided algorithms which use boundary identification techniques to actively discover cases close to boundaries. We introduce a complexity-guided redundancy reduction algorithm which uses a case complexity threshold to retain cases close to boundaries and delete cases that form single class clusters. The algorithm offers control over the balance between maintaining competence and reducing case base size. The performance of a case-based reasoning system relies on the integrity of its case base but in real life applications the available data invariably contains erroneous, noisy cases. Automated removal of these noisy cases can improve system accuracy. In addition, error rates can often be reduced by removing cases to give smoother decision boundaries between classes. We show that the optimal level of boundary smoothing is domain dependent and, therefore, our approach to error reduction reacts to the characteristics of the domain by setting an appropriate level of smoothing. We introduce a novel algorithm which identifies and removes both noisy and boundary cases with the aid of a local distance ratio. A prototype interface has been developed that shows how the modelling and maintenance approaches can be used in practice in an interactive manner. The interface allows the knowledge engineer to make informed maintenance choices without the need for extensive evaluation effort while, at the same time, retaining control over the process. One of the strengths of our approach is in applying a consistent, integrated method to case base maintenance to provide a transparent process that gives a degree of explanation

    An Analysis of the Structure of the Fante Verb With Special Reference to Tone and Glottalisation.

    Get PDF
    The tonal phonemes which occur in utterances containing only one sentence are (i) high tone, (ii) downstep between successive high tones, and (iii) a slight rise towards the end of a prepausal high tone. The phonemic status of the second and third of these is very largely accounted for by low tones becoming high in agreement with adjacent high tones; downstep is basically an automatic feature of the second of two high tones which are separated by one or more low tones, but if a low tone between two high tones becomes high in tonal aagreement with the preceding or following high the downstep remains, occurring between the agreeing high and the high with which it is not in agreement. The slight rise towards the end of a prepausal high tone is basically an automatic feature of a high tone which is in pause and is borne by a tone-bearing unit without a final glottal stop, but if a low tone becomes high in pause in agreement with the preceding high it does not have the slight rise. The remaining occurrences of downstep and non- occurrences of the slight rise can be accounted for by the postulation of zero tone-bearing units with low or high tone (which mostly turn out to correspond to non-zero tone-bearing units in other dialects or languages). The glottal stop is an accentual rather than a consonantal phoneme. It sometimes represents a separate morpheme which might reasonably be looked upon as a morpheme of intonation, but apart from that it is basically an automatic feature of a tone-bearing unit of the pattern consonant-vowel-consonant which is in pause

    CBR assisted context-aware surface realisation for data-to-text generation.

    Get PDF
    Current state-of-the-art neural systems for Data-to-Text Generation (D2T) struggle to generate content from past events with interesting insights. This is because these systems have limited access to historic data and can also hallucinate inaccurate facts in their generations. In this paper, we propose a CBR-assisted context-aware methodology for surface realisation in D2T that carefully selects important contextual data from past events and utilises a hybrid CBR and neural text generator to generate the final event summary. Through extensive experimentation on a sports domain dataset, we empirically demonstrate that our proposed method is able to accurately generate contextual content closer to human-authored summaries when compared to other state-of-the-art systems

    Content type profiling of data-to-text generation datasets.

    Get PDF
    Data-to-Text Generation (D2T) problems can be considered as a stream of time-stamped events with a text summary being produced for each. The problem becomes more challenging when event summaries contain complex insights derived from multiple records either within an event, or across several events from the event stream. It is important to understand the different types of content present in the summary to help us better define the system requirements so that we can build better systems. In this paper, we propose a novel typology of content types, that we use to classify the contents of event summaries. Using the typology, a profile of a dataset is generated as the distribution of the aggregated content types which captures the specific characteristics of the dataset and gives a measure of the complexity present in the problem. Through extensive experiments on different D2T datasets we demonstrate that neural generative systems specifically struggle to generate contents of complex types, highlighting the need for improved D2T techniques

    Improving e-learning recommendation by using background knowledge.

    Get PDF
    There is currently a large amount of e-Learning resources available to learners on the Web. However, learners often have difficulty finding and retrieving relevant materials to support their learning goals because they lack the domain knowledge to craft effective queries that convey what they wish to learn. In addition, the unfamiliar vocabulary often used by domain experts makes it difficult to map a learner's query to a relevant learning material. We address these challenges by introducing an innovative method that automatically builds background knowledge for a learning domain. In creating our method, we exploit a structured collection of teaching materials as a guide for identifying the important domain concepts. We enrich the identified concepts with discovered text from an encyclopedia, thereby increasing the richness of our acquired knowledge. We employ the developed background knowledge for influencing the representation and retrieval of learning resources to improve e-Learning recommendation. The effectiveness of our method is evaluated using a collection of Machine Learning and Data Mining papers. Our method outperforms the benchmark, demonstrating the advantage of using background knowledge for improving the representation and recommendation of e-Learning materials

    Music recommendation: audio neighbourhoods to discover music in the long tail.

    Get PDF
    Millions of people use online music services every day and recommender systems are essential to browse these music collections. Users are looking for high quality recommendations, but also want to discover tracks and artists that they do not already know, newly released tracks, and the more niche music found in the 'long tail' of on-line music. Tag-based recommenders are not effective in this 'long tail' because relatively few people are listening to these tracks and so tagging tends to be sparse. However, similarity neighbourhoods in audio space can provide additional tag knowledge that is useful to augment sparse tagging. A new recommender exploits the combined knowledge, from audio and tagging, using a hybrid representation that extends the track's tag-based representation by adding semantic knowledge extracted from the tags of similar music tracks. A user evaluation and a larger experiment using Last.fm user data both show that the new hybrid recommender provides better quality recommendations than using only tags, together with a higher level of discovery of unknown and niche music. This approach of augmenting the representation for items that have missing information, with corresponding information from similar items in a complementary space, offers opportunities beyond content-based music recommendation

    Case-based approach to automated natural language generation for obituaries.

    Get PDF
    Automated generation of human readable text from structured information is challenging because grammatical rules are complex making good quality outputs difficult to achieve. Textual Case-Based Reasoning provides one approach in which the text from previously solved examples with similar inputs is reused as a template solution to generate text for the current problem. Natural Language Generation also poses a challenge when evaluating the quality of the text generated due to the high cost of human labelling and the variety in potential good quality solutions. In this paper, we propose two case-based approaches for reusing text to automatically generate an obituary from a set of input attribute-value pairs. The case-base is acquired by crawling and then tagging existing solutions published on the web to create cases as problem-solution pairs. We evaluate the quality of the text generation system with a novel unsupervised case alignment metric using normalised discounted cumulative gain which is compared to a supervised approach and human evaluation. Initial results show that our proposed evaluation measure is effective and correlates well with average attribute error evaluation which is a crude surrogate to human feedback. The system is being deployed in a real-world application with a startup company in Aberdeen to produce automated obituaries

    Angles of vision: digital storytelling on the cosmic tide?

    Get PDF
    In this report, a collaboration between Robert Gordon University and the University of the Highlands and Islands Institute for Northern Studies, the authors bring together findings from four workshops hosted as part of the My Orkney Story project. It aims to address the opportunities and challenges of developing digital storytelling platforms through the lens of Orkney as a case study. However, the findings from this report are also intended to have a wider relevancy to the development and implementation of digital story platforms at a local and international level

    Harnessing background knowledge for e-learning recommendation.

    Get PDF
    The growing availability of good quality, learning-focused content on the Web makes it an excellent source of resources for e-learning systems. However, learners can find it hard to retrieve material well-aligned with their learning goals because of the difficulty in assembling effective keyword searches due to both an inherent lack of domain knowledge, and the unfamiliar vocabulary often employed by domain experts. We take a step towards bridging this semantic gap by introducing a novel method that automatically creates custom background knowledge in the form of a set of rich concepts related to the selected learning domain. Further, we develop a hybrid approach that allows the background knowledge to influence retrieval in the recommendation of new learning materials by leveraging the vocabulary associated with our discovered concepts in the representation process. We evaluate the effectiveness of our approach on a dataset of Machine Learning and Data Mining papers and show it to outperform the benchmark methods. This paper has won the Donald Michie Memorial Award for Best Technical Paper at AI-2016

    Music-inspired texture representation.

    Get PDF
    Techniques for music recommendation are increasingly relying on hybrid representations to retrieve new and exciting music. A key component of these representations is musical content, with texture being the most widely used feature. Current techniques for representing texture however are inspired by speech, not music, therefore music representations are not capturing the correct nature of musical texture. In this paper we investigate two parts of the well-established mel-frequency cepstral coefficients (MFCC) representation: the resolution of mel-frequencies related to the resolution of musical notes; and how best to describe the shape of texture. Through contextualizing these parts, and their relationship to music, a novel music-inspired texture representation is developed. We evaluate this new texture representation by applying it to the task of music recommendation. We use the representation to build three recommendation models, based on current state-of-theart methods. Our results show that by understanding two key parts of texture representation, it is possible to achieve a significant recommendation improvement. This contribution of a music-inspired texture representation will not only improve content-based representation, but will allow hybrid systems to take advantage of a stronger content component
    • …
    corecore